A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index

نویسنده

Lodewijk C. M. Kallenberg

چکیده

In a recent paper Katehakis and Chen propose a sequence of linear programs for the computation of the Gittins indices. If there are N' projects and project c has K^ states, then 5)*-1 K,. linear prc^rams have to be solved. In this note it is shown that instead of the K,, linear programs for project c also one parametric linear program with the same dimensions can be solved. 1. Introduction. In a recent paper Katehakis and Chen (1984) propose a sequence of linear programs for the computation of the Gittins indices. In this note we show a computationally more favorable approach by using parametric linear programming. Wherever possible, Katehakis and Chen's notation is followed. Consider the following version of the multi-armed bandit problem. There are N projects and project t; is at each instant of time in one of the states of the set • Sp == {1,2,.. ., A;^). After observing the states of each project, one project must be selected to work on. If project v is selected at time t and the state of the project is state i, then a reward R^ii) is earned and p^ij\i) denotes the probability that the state of project V is state y at the next instant of time (the states of the unselected projects are unchanged). The problem is to find a rule for selecting the project such that the expected total a-discounted rewards for a discoimt factor a E [0,1) is maximized. Gittins and his co-workers have shown the existence of numbers M^ii), 1 < / < K^, \ < V < N, such that if at time point / project o is in state x^it), I < v < N, an optimal rule is to select project t;*, where

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing Optimal Sequential Allocation Rules in Clinical Trials*

Michael N. Katehakis State University of New York at Stony Brook and Cyrus Derman Columbia University The problem of assigning one of several treatments in clinical trials is formulated as a discounted bandit problem that was studied by Gittins and Jones. The problem involves comparison of certain state dependent indices A recent characterization of the index is used to calculate more efficient...

متن کامل

Linear Functions Preserving Sut-Majorization on RN

Suppose $textbf{M}_{n}$ is the vector space of all $n$-by-$n$ real matrices, and let $mathbb{R}^{n}$ be the set of all $n$-by-$1$ real vectors. A matrix $Rin textbf{M}_{n}$ is said to be $textit{row substochastic}$ if it has nonnegative entries and each row sum is at most $1$. For $x$, $y in mathbb{R}^{n}$, it is said that $x$ is $textit{sut-majorized}$ by $y$ (denoted by $ xprec_{sut} y$) if t...

متن کامل

Q-Learning for Bandit Problems

Multi-armed bandits may be viewed as decompositionally-structured Markov decision processes (MDP's) with potentially very large state sets. A particularly elegant methodology for computing optimal policies was developed over twenty ago by Gittins Gittins & Jones, 1974]. Gittins' approach reduces the problem of nding optimal policies for the original MDP to a sequence of low-dimensional stopping...

متن کامل

A note on convergence in fuzzy metric spaces

The sequential $p$-convergence in a fuzzy metric space, in the sense of George and Veeramani, was introduced by D. Mihet as a weaker concept than convergence. Here we introduce a stronger concept called $s$-convergence, and we characterize those fuzzy metric spaces in which convergent sequences are $s$-convergent. In such a case $M$ is called an $s$-fuzzy metric. If $(N_M,ast)$ is a fuzzy metri...

متن کامل

Restart Probability Model

We discuss a new applied probability model: there is a system whose evolution is described by a Markov chain (MC) with known transition matrix on a discrete state space and at each moment of a discrete time a decision maker can apply one of three possible actions: continue, quit, and restart MC in one of a finite number of fixed “restarting” points. Such a model is a generalization of a model d...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Math. Oper. Res.

دوره 11 شماره

صفحات -

تاریخ انتشار 1986

A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index

نویسنده

چکیده

منابع مشابه

Computing Optimal Sequential Allocation Rules in Clinical Trials*

Linear Functions Preserving Sut-Majorization on RN

Q-Learning for Bandit Problems

A note on convergence in fuzzy metric spaces

Restart Probability Model

عنوان ژورنال:

اشتراک گذاری